Sublexical Translations for Low-Resource Language
نویسندگان
چکیده
Machine Translation (MT) for low-resource language has low-coverage issues due to Out-OfVocabulary (OOV) Words. In this research we propose a method using sublexical translation to achieve wide-coverage in Example-Based Machine Translation (EBMT) for English to Bangla language. For sublexical translation we divide the OOV words into sublexical units for getting translation candidates. Previous methods without sublexical translation failed to find translation candidate for many joint words. In this research using WordNet and IPA transliteration algorithm we propose to translate OOV words with explanation. The proposed method is better than previous OOV words handling. Our proposal improved translation quality by 20 points in human evaluation.
منابع مشابه
Using Sublexical Translations to Handle the OOV Problem in MT
We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based on our OOV analysis, aimed at maximizing the probability of retrieving OOVs’ sublexical translations from existing resource of machine translati...
متن کاملLeveraging translations for speech transcription in low-resource settings
Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a highresource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely lowresource settings with the as...
متن کاملLearning Translations for Tagged Words: Extending the Translation Lexicon of an ITG for Low Resource Languages
We tackle the challenge of learning part-ofspeech classified translations as part of an inversion transduction grammar, by learning translations for English words with known part-of-speech tags, both from existing translation lexica and from parallel corpora. When translating from a low resource language into English, we can expect to have rich resources for English, such as treebanks, and smal...
متن کاملPivot-based word alignment
Word alignment is the task of, given two sentences that are translations of each other, determining which words correspond to each other across the two sentences. Word alignment is an important step in the pipeline of constructing a statistical machine translation system, but success at word alignment depends heavily on the quantity of training data available. The traditional methods for comput...
متن کاملAn Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages
For many low-resource languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Translated speech data is potentially valuable for documenting endangered languages or for training speech translation systems. A first step towards making use of such data would be to automatically align spoken words with their translations. We present a model ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013